Mike Middleton, March 2022
https://orcid.org/0000-0001-5813-6347
Colab Notebook: Live code (Must be logged into Google. Select Google Colaboratory, at the top of the screen, if page opens as raw code)
HTML: Read only
HTML: Read only topographic
Colab Notebook: Live code
HTML: Read only
HTML: Read only topographic
Colab Notebook: Live code
HTML: Read only
HTML: Read only topographic
Colab Notebook: Live code
HTML: Read only
HTML: Read only topographic
Colab Notebook: Live code
HTML: Read only
HTML: Read only topographic
Colab Notebook: Live code
HTML: Read only
HTML: Read only topographic
Pre-processed data and images are available for download (without the need to run the code in these files) here:
https://github.com/MikeDairsie/Hillforts-Primer.
To download, save images or to change the background image to show the topography, first save a copy of this document into your Google Drive folder. Once saved, change download_data, save_images and/or show_topography to True in the code blocks below, Save and then select Runtime>Run all in the main menu above to rerun the code. If selected, running the code will initiate the download and saving of files. Each document will download a number of data packages and you may be prompted to allow multiple downloads. Be patient, downloads may take a little time after the document has finished running. Note that each part of the Hillforts Primer is independent and the download, save_image and show_topography variables will need to be enabled in each document, if this functionality is required. Also note that saving images will activate the Google Drive folder and this will request the user to allow access. Selecting show_topography will change the background image to a colour topographic map. It should also be noted that, if set to True, this view will only show the distribution of the data selected. It will not show the overall distribution as a grey background layer as is seen when using the simple coastal outlines.
download_data = False
save_images = False
show_topography = False
The initial sections of all the Hillforts Primer documents set up the coding environment and define functions used to plot, reprocess and save the data. If you would like to bypass the setup, please use the following link:
Go to Review Data Part 3.
The Atlas of Hillforts of Britain and Ireland data is made available under the licence, Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). This allows for redistribution, sharing and transformation of the data, as long as the results are credited and made available under the same licence conditions.
The data was downloaded from The Atlas of Hillforts of Britain and Ireland website as a csv file (comma separated values) and saved onto the author’s GitHub repository thus enabling the data to be used by this document.
Lock, G. and Ralston, I. 2017. Atlas of Hillforts of Britain and Ireland. [ONLINE] Available at: https://hillforts.arch.ox.ac.uk
Rest services: https://maps.arch.ox.ac.uk/server/rest/services/hillforts/Atlas_of_Hillforts/MapServer
Licence: https://creativecommons.org/licenses/by-sa/4.0/
Help: https://hillforts.arch.ox.ac.uk/assets/help.pdf
Data Structure: https://maps.arch.ox.ac.uk/assets/data.html
Hillforts: Britain, Ireland and the Nearer Continent (Sample): https://www.archaeopress.com/ArchaeopressShop/DMS/A72C523E8B6742ED97BA86470E747C69/9781789692266-sample.pdf
import sys
print(f'Python: {sys.version}')
import sklearn
print(f'Scikit-Learn: {sklearn.__version__}')
import pandas as pd
print(f'pandas: {pd.__version__}')
import numpy as np
print(f'numpy: {np.__version__}')
%matplotlib inline
import matplotlib
print(f'matplotlib: {matplotlib.__version__}')
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.patches as mpatches
import matplotlib.patches as patches
from matplotlib.cbook import boxplot_stats
from matplotlib.lines import Line2D
import matplotlib.cm as cm
import seaborn as sns
print(f'seaborn: {sns.__version__}')
sns.set(style="whitegrid")
import scipy
print(f'scipy: {scipy.__version__}')
from scipy import stats
from scipy.stats import gaussian_kde
import os
import collections
import math
import random
import PIL
import urllib
random.seed(42) # A random seed is used to ensure that the random numbers created are the same for each run of this document.
from slugify import slugify
# Import Google colab tools to access Drive
from google.colab import drive
Python: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] Scikit-Learn: 1.2.2 pandas: 1.5.3 numpy: 1.22.4 matplotlib: 3.7.1 seaborn: 0.12.2 scipy: 1.10.1
The following functions will be used to plot data later in the document.
def show_records(plt, plot_data):
text_colour = 'k'
if show_topography == True:
text_colour = 'w'
plt.annotate(str(len(plot_data))+' records', xy=(-1180000, 6420000), xycoords='data', ha='left', color=text_colour)
def get_backgrounds():
if show_topography == True:
backgrounds = ["hillforts-topo-01.png",
"hillforts-topo-north.png",
"hillforts-topo-northwest-plus.png",
"hillforts-topo-northwest-minus.png",
"hillforts-topo-northeast.png",
"hillforts-topo-south.png",
"hillforts-topo-south-plus.png",
"hillforts-topo-ireland.png",
"hillforts-topo-ireland-north.png",
"hillforts-topo-ireland-south.png"]
else:
backgrounds = ["hillforts-outline-01.png",
"hillforts-outline-north.png",
"hillforts-outline-northwest-plus.png",
"hillforts-outline-northwest-minus.png",
"hillforts-outline-northeast.png",
"hillforts-outline-south.png",
"hillforts-outline-south-plus.png",
"hillforts-outline-ireland.png",
"hillforts-outline-ireland-north.png",
"hillforts-outline-ireland-south.png"]
return backgrounds
def get_bounds():
bounds = [[-1200000,220000,6400000,8700000],
[-1200000,220000,7000000,8700000],
[-1200000,-480000,7000000,8200000],
[-900000,-480000,7100000,8200000],
[-520000, 0,7000000,8700000],
[-800000,220000,6400000,7100000],
[-1200000,220000,6400000,7100000],
[-1200000,-600000,6650000,7450000],
[-1200000,-600000,7050000,7450000],
[-1200000,-600000,6650000,7080000]]
return bounds
def show_background(plt, ax, location=""):
backgrounds = get_backgrounds()
bounds = get_bounds()
folder = "https://raw.githubusercontent.com/MikeDairsie/Hillforts-Primer/main/hillforts-topo/"
if location == "n":
background = os.path.join(folder, backgrounds[1])
bounds = bounds[1]
elif location == "nw+":
background = os.path.join(folder, backgrounds[2])
bounds = bounds[2]
elif location == "nw-":
background = os.path.join(folder, backgrounds[3])
bounds = bounds[3]
elif location == "ne":
background = os.path.join(folder, backgrounds[4])
bounds = bounds[4]
elif location == "s":
background = os.path.join(folder, backgrounds[5])
bounds = bounds[5]
elif location == "s+":
background = os.path.join(folder, backgrounds[6])
bounds = bounds[6]
elif location == "i":
background = os.path.join(folder, backgrounds[7])
bounds = bounds[7]
elif location == "in":
background = os.path.join(folder, backgrounds[8])
bounds = bounds[8]
elif location == "is":
background = os.path.join(folder, backgrounds[9])
bounds = bounds[9]
else:
background = os.path.join(folder, backgrounds[0])
bounds = bounds[0]
img = np.array(PIL.Image.open(urllib.request.urlopen(background)))
ax.imshow(img, extent=bounds)
def get_counts(data):
data_counts = []
for col in data.columns:
count = len(data[data[col] == 'Yes'])
data_counts.append(count)
return data_counts
def add_annotation_plot(ax):
ax.annotate("Middleton, M. 2022, Hillforts Primer", size='small', color='grey', xy=(0.01, 0.01), xycoords='figure fraction', horizontalalignment = 'left')
ax.annotate("Source Data: Lock & Ralston, 2017. hillforts.arch.ox.ac.uk", size='small', color='grey', xy=(0.99, 0.01), xycoords='figure fraction', horizontalalignment = 'right')
def add_annotation_l_xy(ax):
ax.annotate("Middleton, M. 2022, Hillforts Primer", size='small', color='grey', xy=(0.01, 0.035), xycoords='figure fraction', horizontalalignment = 'left')
ax.annotate("Source Data: Lock & Ralston, 2017. hillforts.arch.ox.ac.uk", size='small', color='grey', xy=(0.99, 0.035), xycoords='figure fraction', horizontalalignment = 'right')
def plot_bar_chart(data, split_pos, x_label, y_label, title):
fig = plt.figure(figsize=(12,5))
ax = fig.add_axes([0,0,1,1])
x_data = data.columns
x_data = [x.split("_")[split_pos:] for x in x_data]
x_data_new = []
for l in x_data :
txt = ""
for part in l:
txt += "_" + part
x_data_new.append(txt[1:])
y_data = get_counts(data)
ax.bar(x_data_new,y_data)
ax.set_xlabel(x_label)
ax.set_ylabel(y_label)
add_annotation_plot(ax)
plt.title(get_print_title(title))
save_fig(title)
plt.show()
def plot_bar_chart_using_two_tables(x_data, y_data, x_label, y_label, title):
fig = plt.figure(figsize=(12,5))
ax = fig.add_axes([0,0,1,1])
ax.bar(x_data,y_data)
ax.set_xlabel(x_label)
ax.set_ylabel(y_label)
add_annotation_plot(ax)
plt.title(get_print_title(title))
save_fig(title)
plt.show()
def plot_bar_chart_numeric(data, split_pos, x_label, y_label, title, n_bins):
new_data = data.copy()
fig = plt.figure(figsize=(12,5))
ax = fig.add_axes([0,0,1,1])
data[x_label].plot(kind='hist', bins = n_bins)
ax.set_xlabel(x_label)
ax.set_ylabel(y_label)
add_annotation_plot(ax)
plt.title(get_print_title(title))
save_fig(title)
plt.show()
def plot_bar_chart_value_counts(data, x_label, y_label, title):
fig = plt.figure(figsize=(12,5))
ax = fig.add_axes([0,0,1,1])
df = data.value_counts()
x_data = df.index.values
y_data = df.values
ax.bar(x_data,y_data)
ax.set_xlabel(x_label)
ax.set_ylabel(y_label)
add_annotation_plot(ax)
plt.title(get_print_title(title))
save_fig(title)
plt.show()
def get_bins(data, bins_count):
data_range = data.max() - data.min()
print(bins_count)
if bins_count != None:
x_bins = [x for x in range(data.min(), data.max(), bins_count)]
n_bins = len(x_bins)
else:
n_bins = int(data_range)
if n_bins < 10:
multi = 10
while n_bins< 10:
multi *= 10
n_bins = int(data_range * multi)
elif n_bins > 100:
n_bins = int(data_range)/10
return n_bins
def plot_histogram(data, x_label, title, bins_count = None):
n_bins = get_bins(data, bins_count)
fig = plt.figure(figsize=(12,5))
ax = fig.add_axes([0,0,1,1])
ax.set_xlabel(x_label)
ax.set_ylabel('Count')
plt.ticklabel_format(style='plain')
plt.hist(data, bins=n_bins)
plt.title(get_print_title(title))
add_annotation_plot(ax)
save_fig(title)
plt.show()
def plot_continuous(data, x_lable, title):
fig = plt.figure(figsize=(12,8))
ax = fig.add_axes([0,0,1,1])
ax.set_xlabel(x_lable)
plt.plot(data, linewidth=4)
plt.ticklabel_format(style='plain')
plt.title(get_print_title(title))
add_annotation_plot(ax)
save_fig(title)
plt.show()
# box plot
from matplotlib.cbook import boxplot_stats
def plot_data_range(data, feature, o="v"):
fig = plt.figure(figsize=(12,8))
ax = fig.add_axes([0,0,1,1])
ax.set_xlabel(feature)
add_annotation_plot(ax)
plt.title(get_print_title(feature + " Range"))
plt.ticklabel_format(style='plain')
if o == "v":
sns.boxplot(data=data, orient="v")
else:
sns.boxplot(data=data, orient="h")
save_fig(feature + " Range")
plt.show()
bp = boxplot_stats(data)
low = bp[0].get('whislo')
q1 = bp[0].get('q1')
median = bp[0].get('med')
q3 = bp[0].get('q3')
high = bp[0].get('whishi')
return [low, q1, median, q3, high]
def location_XY_plot():
plt.ticklabel_format(style='plain')
plt.xlim(-1200000,220000)
plt.ylim(6400000,8700000)
add_annotation_l_xy(plt)
def add_grey(region=''):
if show_topography == False:
# plots all the hillforts as a grey background
loc = location_data.copy()
if region == 's':
loc = loc[loc['Location_Y'] < 8000000].copy()
loc = loc[loc['Location_X'] > -710000].copy()
elif region == 'ne':
loc = loc[loc['Location_Y'] < 8000000].copy()
loc = loc[loc['Location_X'] > -800000].copy()
plt.scatter(loc['Location_X'], loc['Location_Y'], c='Silver')
def plot_over_grey_numeric(merged_data, a_type, title, extra="", inner=False, fringe=False, oxford=False,swindon=False):
plot_data = merged_data
fig, ax = plt.subplots(figsize=(14.2 * 0.66, 23.0 * 0.66))
show_background(plt, ax)
location_XY_plot()
add_grey()
patches = add_oxford_swindon(oxford,swindon)
plt.scatter(plot_data['Location_X'], plot_data['Location_Y'], c='Red')
if fringe:
f_for_legend = add_21Ha_fringe()
patches.append(f_for_legend)
if inner:
i_for_legend = add_21Ha_line()
patches.append(i_for_legend)
show_records(plt, plot_data)
plt.legend(loc='upper left', handles= patches)
plt.title(get_print_title(title))
save_fig(title)
plt.show()
def plot_over_grey_boundary(merged_data, a_type, boundary_type):
plot_data = merged_data[merged_data[a_type] == boundary_type]
fig, ax = plt.subplots(figsize=(9.47, 15.33))
show_background(plt, ax)
location_XY_plot()
add_grey(region='')
plt.scatter(plot_data['Location_X'], plot_data['Location_Y'], c='Red')
show_records(plt, plot_data)
plt.title(get_print_title('Boundary_Type: ' + boundary_type))
save_fig('Boundary_Type_' + boundary_type)
plt.show()
print(f'{round((len(plot_data)/len(merged_data)*100), 2)}%')
def plot_density_over_grey(data, data_type):
new_data = data.copy()
new_data = new_data.drop(['Density'], axis=1)
new_data = add_density(new_data)
fig, ax = plt.subplots(figsize=((14.2 * 0.66)+2.4, 23.0 * 0.66))
show_background(plt, ax)
location_XY_plot()
add_grey()
plt.scatter(new_data['Location_X'], new_data['Location_Y'], c=new_data['Density'], cmap=cm.rainbow, s=25)
plt.colorbar(label='Density')
plt.title(get_print_title(f'Density - {data_type}'))
save_fig(f'Density_{data_type}')
plt.show()
def add_21Ha_line():
x_values = [-367969, -344171, -263690, -194654, -130542, -119597, -162994, -265052]#, -304545]
y_values = [7019842, 6944572, 6850593, 6779602, 6735058, 6710127, 6684152, 6663609]#, 6611780]
plt.plot(x_values, y_values, 'k', ls='-', lw=15, alpha=0.25, label = '≥ 21 Ha Line')
add_to_legend = Line2D([0], [0], color='k', lw=15, alpha=0.25, label = '≥ 21 Ha Line')
return add_to_legend
def add_21Ha_fringe():
x_values = [-367969,-126771,29679,-42657,-248650,-304545,-423647,-584307,-367969]
y_values = [7019842,6847138,6671658,6596650,6554366,6611780,6662041,6752378,7019842]
plt.plot(x_values, y_values, 'k', ls=':', lw=5, alpha=0.45, label = '≥ 21 Ha Fringe')
add_to_legend = Line2D([0], [0], color='k', ls=':', lw=5, alpha=0.45, label = '≥ 21 Ha Fringe')
return add_to_legend
def add_oxford_swindon(oxford=False,swindon=False):
# plots a circle over Swindon & Oxford
radius = 50
marker_size = (2*radius)**2
patches = []
if oxford:
plt.scatter(-144362,6758380, c='dodgerblue', s=marker_size, alpha=0.50)
b_patch = mpatches.Patch(color='dodgerblue', label='Oxford orbit')
patches.append(b_patch)
if swindon:
plt.scatter(-197416, 6721977, c='yellow', s=marker_size, alpha=0.50)
y_patch = mpatches.Patch(color='yellow', label='Swindon orbit')
patches.append(y_patch)
return patches
def plot_over_grey(merged_data, a_type, yes_no, extra="", inner=False, fringe=False, oxford=False,swindon=False):
# plots selected data over the grey dots. yes_no controlls filtering the data for a positive or negative values.
plot_data = merged_data[merged_data[a_type] == yes_no]
fig, ax = plt.subplots(figsize=(14.2 * 0.66, 23.0 * 0.66))
show_background(plt, ax)
location_XY_plot()
add_grey()
patches = add_oxford_swindon(oxford,swindon)
plt.scatter(plot_data['Location_X'], plot_data['Location_Y'], c='Red')
if fringe:
f_for_legend = add_21Ha_fringe()
patches.append(f_for_legend)
if inner:
i_for_legend = add_21Ha_line()
patches.append(i_for_legend)
show_records(plt, plot_data)
plt.legend(loc='upper left', handles= patches)
plt.title(get_print_title(f'{a_type} {extra}'))
save_fig(f'{a_type}_{extra}')
plt.show()
print(f'{round((len(plot_data)/len(merged_data)*100), 2)}%')
return plot_data
def plot_type_values(data, data_type, title):
new_data = data.copy()
fig, ax = plt.subplots(figsize=((14.2 * 0.66)+2.4, 23.0 * 0.66))
show_background(plt, ax)
location_XY_plot()
plt.scatter(new_data['Location_X'], new_data['Location_Y'], c=new_data[data_type], cmap=cm.rainbow, s=25)
plt.colorbar(label=data_type)
plt.title(get_print_title(title))
save_fig(title)
plt.show()
def bespoke_plot(plt, title):
add_annotation_plot(plt)
plt.ticklabel_format(style='plain')
plt.title(get_print_title(title))
save_fig(title)
plt.show()
def get_proportions(date_set):
total = sum(date_set) - date_set[-1]
newset = []
for entry in date_set[:-1]:
newset.append(round(entry/total,2))
return newset
def plot_dates_by_region(nw,ne,ni,si,s, features):
fig = plt.figure(figsize=(12,5))
ax = fig.add_axes([0,0,1,1])
x_data = nw[features].columns
x_data = [x.split("_")[2:] for x in x_data][:-1]
x_data_new = []
for l in x_data:
txt = ""
for part in l:
txt += "_" + part
x_data_new.append(txt[1:])
set1_name = 'NW'
set2_name = 'NE'
set3_name = 'N Ireland'
set4_name = 'S Ireland'
set5_name = 'South'
set1 = get_proportions(get_counts(nw[features]))
set2 = get_proportions(get_counts(ne[features]))
set3 = get_proportions(get_counts(ni[features]))
set4 = get_proportions(get_counts(si[features]))
set5 = get_proportions(get_counts(s[features]))
X_axis = np.arange(len(x_data_new))
budge = 0.25
plt.bar(X_axis - 0.55 + budge, set1, 0.3, label = set1_name)
plt.bar(X_axis - 0.4 + budge, set2, 0.3, label = set2_name)
plt.bar(X_axis - 0.25 + budge, set3, 0.3, label = set3_name)
plt.bar(X_axis - 0.1 + budge, set4, 0.3, label = set4_name)
plt.bar(X_axis + 0.05 + budge, set5, 0.3, label = set5_name)
plt.xticks(X_axis, x_data_new)
plt.xlabel('Dating')
plt.ylabel('Proportion of Total Dated Hillforts in Region')
title = 'Proportions of Dated Hillforts by Region'
plt.title(title)
plt.legend()
add_annotation_plot(ax)
save_fig(title)
plt.show()
The following functions will be used to confirm that features are not lost or forgotten when splitting the data.
def test_numeric(data):
temp_data = data.copy()
columns = data.columns
out_cols = ['Feature','Entries', 'Numeric', 'Non-Numeric', 'Null']
feat, ent, num, non, nul = [],[],[],[],[]
for col in columns:
if temp_data[col].dtype == 'object':
feat.append(col)
temp_data[col+'_num'] = temp_data[col].str.isnumeric()
entries = temp_data[col].notnull().sum()
true_count = temp_data[col+'_num'][temp_data[col+'_num'] == True].sum()
null_count = temp_data[col].isna().sum()
ent.append(entries)
num.append(true_count)
non.append(entries-true_count)
nul.append(null_count)
else:
print(f'{col} {temp_data[col].dtype}')
summary = pd.DataFrame(list(zip(feat, ent, num, non, nul)))
summary.columns = out_cols
return summary
def find_duplicated(numeric_data, text_data, encodeable_data):
d = False
all_columns = list(numeric_data.columns) + list(text_data.columns) + list(encodeable_data.columns)
duplicate = [item for item, count in collections.Counter(all_columns).items() if count > 1]
if duplicate :
print(f"There are duplicate features: {duplicate}")
d = True
return d
def test_data_split(main_data, numeric_data, text_data, encodeable_data):
m = False
split_features = list(numeric_data.columns) + list(text_data.columns) + list(encodeable_data.columns)
missing = list(set(main_data)-set(split_features))
if missing:
print(f"There are missing features: {missing}")
m = True
return m
def review_data_split(main_data, numeric_data, text_data, encodeable_data = pd.DataFrame()):
d = find_duplicated(numeric_data, text_data, encodeable_data)
m = test_data_split(main_data, numeric_data, text_data, encodeable_data)
if d != True and m != True:
print("Data split good.")
def find_duplicates(data):
print(f'{data.count() - data.duplicated(keep=False).count()} duplicates.')
def count_yes(data):
total = 0
for col in data.columns:
count = len(data[data[col] == 'Yes'])
total+= count
print(f'{col}: {count}')
print(f'Total yes count: {total}')
The following functions will be used to update null values.
def fill_nan_with_minus_one(data, feature):
new_data = data.copy()
new_data[feature] = data[feature].fillna(-1)
return new_data
def fill_nan_with_NA(data, feature):
new_data = data.copy()
new_data[feature] = data[feature].fillna("NA")
return new_data
def test_numeric_value_in_feature(feature, value):
test = feature.isin([-1]).sum()
return test
def test_catagorical_value_in_feature(dataframe, feature, value):
test = dataframe[feature][dataframe[feature] == value].count()
return test
def test_cat_list_for_NA(dataframe, cat_list):
for val in cat_list:
print(val, test_catagorical_value_in_feature(dataframe, val,'NA'))
def test_num_list_for_minus_one(dataframe, num_list):
for val in num_list:
feature = dataframe[val]
print(val, test_numeric_value_in_feature(feature, -1))
def update_cat_list_for_NA(dataframe, cat_list):
new_data = dataframe.copy()
for val in cat_list:
new_data = fill_nan_with_NA(new_data, val)
return new_data
def update_num_list_for_minus_one(dataframe, cat_list):
new_data = dataframe.copy()
for val in cat_list:
new_data = fill_nan_with_minus_one(new_data, val)
return new_data
def add_density(data):
new_data = data.copy()
xy = np.vstack([new_data['Location_X'],new_data['Location_Y']])
new_data['Density'] = gaussian_kde(xy)(xy)
return new_data
fig_no = 0
part = 'Part03'
IMAGES_PATH = r'/content/drive/My Drive/'
fig_list = pd.DataFrame(columns=['fig_no', 'file_name', 'title'])
topo_txt = ""
if show_topography:
topo_txt = "-topo"
def get_file_name(title):
file_name = slugify(title)
return file_name
def get_print_title(title):
title = title.replace("_", " ")
title = title.replace("-", " ")
title = title.replace(",", ";")
return title
def format_figno(no):
length = len(str(no))
fig_no = ''
for i in range(3-length):
fig_no = fig_no + '0'
fig_no = fig_no + str(no)
return fig_no
if save_images == True:
drive.mount('/content/drive')
os.getcwd()
else:
pass
Mounted at /content/drive
def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
global fig_no
global IMAGES_PATH
if save_images:
#IMAGES_PATH = r'/content/drive/My Drive/Colab Notebooks/Hillforts_Primer_Images/HP_Part_03_images/'
fig_no+=1
fig_no_txt = format_figno(fig_no)
file_name = file_name = get_file_name(f'{part}_{fig_no_txt}')
file_name = f'hillforts_primer_{file_name}{topo_txt}.{fig_extension}'
fig_list.loc[len(fig_list)] = [fig_no, file_name, get_print_title(fig_id)]
path = os.path.join(IMAGES_PATH, file_name)
print("Saving figure", file_name)
plt.tight_layout()
plt.savefig(path, format=fig_extension, dpi=resolution, bbox_inches='tight')
else:
pass
The source csv file is loaded and the first two rows are displayed to confirm the load was successful. Note that, to the left, an index has been added automatically. This index will be used frequently when splitting and remerging data extracts.
hillforts_csv = r"https://raw.githubusercontent.com/MikeDairsie/Hillforts-Primer/main/hillforts-atlas-source-data-csv/hillforts.csv"
hillforts_data = pd.read_csv(hillforts_csv, index_col=False)
pd.set_option('display.max_columns', None, 'display.max_rows', None)
hillforts_data.head(2)
<ipython-input-54-2b53084ab660>:2: DtypeWarning: Columns (10,12,68,83,84,85,86,165,183) have mixed types. Specify dtype option on import or set low_memory=False. hillforts_data = pd.read_csv(hillforts_csv, index_col=False)
| OBJECTID | Main_Atlas_Number | Main_Country_Code | Main_Country | Main_Title_Name | Main_Site_Name | Main_Alt_Name | Main_Display_Name | Main_HER | Main_HER_PRN | Main_HER_ID | Main_NMR_Mapsheet | Main_NMR_ID | Main_SM | Main_Summary | Main_Boundary | Main_Coordinate_System | Main_X | Main_Y | Status_Citizen_Science | Status_Citizen | Status_Data_Reliability | Status_Data_Comments | Status_Interpretation_Reliability | Status_Interpretation_Comments | Location_NGR | Location_X | Location_Y | Location_Longitude | Location_Latitude | Location_Current_County | Location_Historic_County | Location_Current_Parish | Management_Condition_Extant | Management_Condition_Cropmark | Management_Condition_Destroyed | Management_Condition_Comments | Management_Land_Use_Woodland | Management_Land_Use_Plantation | Management_Land_Use_Parkland | Management_Land_Use_Pasture | Management_Land_Use_Arable | Management_Land_Use_Scrub | Management_Land_Use_Outcrop | Management_Land_Use_Moorland | Management_Land_Use_Heath | Management_Land_Use_Urban | Management_Land_Use_Coastal | Management_Land_Use_Other | Management_Land_Use_Comments | Landscape_Type_Contour | Landscape_Type_Partial | Landscape_Type_Promontory | Landscape_Type_Hillslope | Landscape_Type_Level | Landscape_Type_Marsh | Landscape_Type_Multiple | Landscape_Type_Comments | Landscape_Topography_Hilltop | Landscape_Topography_Coastal | Landscape_Topography_Inland | Landscape_Topography_Valley | Landscape_Topography_Knoll | Landscape_Topography_Ridge | Landscape_Topography_Scarp | Landscape_Topography_Hillslope | Landscape_Topography_Lowland | Landscape_Topography_Spur | Landscape_Topography_Comments | Landscape_Topography_Dominant | Landscape_Aspect_N | Landscape_Aspect_NE | Landscape_Aspect_E | Landscape_Aspect_SE | Landscape_Aspect_S | Landscape_Aspect_SW | Landscape_Aspect_W | Landscape_Aspect_NW | Landscape_Aspect_Level | Landscape_Altitude | Boundary_Boundary_Type | Boundary_Boundary_Comments | Boundary_Country_Code_2 | Boundary_HER_2 | Boundary_HER_PRN_2 | Boundary_Current_County_2 | Boundary_Historic_County_2 | Boundary_Current_Parish_2 | Dating_Date_Pre_1200BC | Dating_Date_1200BC_800BC | Dating_Date_800BC_400BC | Dating_Date_400BC_AD50 | Dating_Date_AD50_AD400 | Dating_Date_AD400_AD800 | Dating_Date_Post_AD800 | Dating_Date_Unknown | Dating_Date_Reliability | Dating_Date_Comments | Dating_Pre | Dating_Pre_Comments | Dating_Post | Dating_Post_Comments | Investigations_Summary | Interior_Summary | Interior_Water_None | Interior_Water_Spring | Interior_Water_Stream | Interior_Water_Pool | Interior_Water_Flush | Interior_Water_Well | Interior_Water_Other | Interior_Water_Comments | Interior_Surface_None | Interior_Surface_Round | Interior_Surface_Rectangular | Interior_Surface_Curvilinear | Interior_Surface_Roundhouse | Interior_Surface_Pit | Interior_Surface_Quarry | Interior_Surface_Other | Interior_Surface_Comments | Interior_Excavation_None | Interior_Excavation_Pit | Interior_Excavation_Posthole | Interior_Excavation_Roundhouse | Interior_Excavation_Rectangular | Interior_Excavation_Road | Interior_Excavation_Quarry | Interior_Excavation_Other | Interior_Excavation_Nothing | Interior_Excavation_Comments | Interior_Geophysics_None | Interior_Geophysics_Pit | Interior_Geophysics_Roundhouse | Interior_Geophysics_Rectangular | Interior_Geophysics_Road | Interior_Geophysics_Quarry | Interior_Geophysics_Other | Interior_Geophysics_Nothing | Interior_Geophysics_Comments | Interior_Finds_None | Interior_Finds_Pottery | Interior_Finds_Metal | Interior_Finds_Metalworking | Interior_Finds_Human | Interior_Finds_Animal | Interior_Finds_Lithics | Interior_Finds_Evironmental | Interior_Finds_Other | Interior_Finds_Comments | Interior_Aerial_Unchecked | Interior_Aerial_None | Interior_Aerial_Roundhouse | Interior_Aerial_Rectangular | Interior_Aerial_Pit | Interior_Aerial_Posthole | Interior_Aerial_Road | Interior_Aerial_Other | Interior_Aerial_Comments | Entrances_Breaks | Entrances_Breaks_Comments | Entrances_Original | Entrances_Original_Comments | Entrances_Guard_Chambers | Entrances_Chevaux | Entrances_Chevaux_Comments | Entrances_Summary | Enclosing_Summary | Enclosing_Area_1 | Enclosing_Area_2 | Enclosing_Area_3 | Enclosing_Area_4 | Enclosing_Enclosed_Area | Enclosing_Area | Enclosing_Multiperiod | Enclosing_Multiperiod_Comments | Enclosing_Circuit | Enclosing_Circuit_Comments | Enclosing_Max_Ramparts | Enclosing_NE_Quadrant | Enclosing_SE_Quadrant | Enclosing_SW_Quadrant | Enclosing_NW_Quadrant | Enclosing_Quadrant_Comments | Enclosing_Current_Part_Uni | Enclosing_Current_Uni | Enclosing_Current_Part_Bi | Enclosing_Current_Bi | Enclosing_Current_Part_Multi | Enclosing_Current_Multi | Enclosing_Current_Unknown | Enclosing_Period_Part_Uni | Enclosing_Period_Uni | Enclosing_Period_Part_Bi | Enclosing_Period_Bi | Enclosing_Period_Part_Multi | Enclosing_Period_Multi | Enclosing_Surface_None | Enclosing_Surface_Bank | Enclosing_Surface_Wall | Enclosing_Surface_Rubble | Enclosing_Surface_Walk | Enclosing_Surface_Timber | Enclosing_Surface_Vitrification | Enclosing_Surface_Burning | Enclosing_Surface_Palisade | Enclosing_Surface_Counter_Scarp | Enclosing_Surface_Berm | Enclosing_Surface_Unfinished | Enclosing_Surface_Other | Enclosing_Surface_Comments | Enclosing_Excavation_Nothing | Enclosing_Excavation_Bank | Enclosing_Excavation_Wall | Enclosing_Excavation_Murus | Enclosing_Excavation_Timber_Framed | Enclosing_Excavation_Timber_Laced | Enclosing_Excavation_Vitrification | Enclosing_Excavation_Burning | Enclosing_Excavation_Palisade | Enclosing_Excavation_Counter_Scarp | Enclosing_Excavation_Berm | Enclosing_Excavation_Unfinished | Enclosing_Excavation_No_Known | Enclosing_Excavation_Other | Enclosing_Excavation_Comments | Enclosing_Gang_Working | Enclosing_Gang_Working_Comments | Enclosing_Ditches | Enclosing_Ditches_Number | Enclosing_Ditches_Comments | Annex | Annex_Summary | References | URL_Atlas | URL_Wiki | URL_NMR_Resource | NMR_URL | URL_HER_Resource | URL_HER | Related_Dating_Evidence | Related_Investigations | Related_Entrances | Record_URL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | EN | England | EN0001 Aconbury Camp, Herefordshire | Aconbury Camp | Aconbury Beacon | Aconbury Camp, Herefordshire (Aconbury Beacon) | Herefordshire | MHE413 | 910 | SO 53 SW 1 | 110371 | 1001754 | Large, wooded, univallate, partial contour hil... | No | OSGB36 | 350350 | 233050 | No | NaN | Confirmed | NaN | Confirmed | NaN | SO 503330 | -303295 | 6798973 | -2.724548 | 51.993628 | Herefordshire | Herefordshire | Aconbury | Yes | No | No | Main ditch gone on N and W sides. Visitor eros... | Yes | No | No | No | No | Yes | No | No | No | No | No | Yes | Mixed woodland since 19th century with interna... | No | Yes | Yes | No | No | No | No | Partial contour fort following the natural con... | Yes | No | Yes | No | No | No | No | No | No | No | NaN | Hill top, part promontory. | No | No | No | No | No | No | Yes | No | No | 276.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | No | No | Yes | Yes | Yes | No | No | No | C - Low | The finding of Iron Age and Roman pottery sugg... | No | NaN | Yes | Evidence of Civil War occupation and possible ... | In Aubrey's Monumenta Britannica (1665-1693). ... | Little information about interior was gleaned ... | Yes | No | No | No | No | No | No | Spring 0.3km located outside the hillfort | No | No | No | No | No | No | Yes | No | Little information is available from surface e... | Yes | No | No | No | No | No | No | No | No | NaN | Yes | No | No | No | No | No | No | No | NaN | No | Yes | No | No | No | No | No | No | No | Quantity of Iron Age sherds similar to those f... | Yes | No | No | No | No | No | No | No | NaN | 6.0 | Two original and four modern gaps. | 2.0 | Two original inturned entrances at SE and SW c... | No | No | NaN | Two original entrances; the SE inturned. The S... | Univallate hillfort with complete circuit, but... | 7.1 | NaN | NaN | NaN | 7.1 | 9.3 | No | Univallate hillfort with complete circuit. | Yes | Single rampart continues around circuit. | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | No | Yes | No | No | No | No | No | No | No | No | No | No | No | No | Yes | No | No | No | No | No | No | No | No | Yes | No | Yes | Little surface evidence of features and the ba... | No | No | No | No | No | No | No | No | No | No | No | No | No | No | NaN | No | NaN | Yes | 1.0 | Main ditch only present on the S and E sides, ... | No | NaN | Dorling, P. and Wigley, A. 2012. Assessment of... | https://hillforts.arch.ox.ac.uk/?query=Atlas_o... | http://www.wikidata.org/entity/Q31113987 | NaN | NaN | NaN | NaN | Artefactual | 1st Identified Map Depiction (1888); Other (19... | In-turned (South east); In-turned (South west)... | http://hillforts.arch.ox.ac.uk/records/EN0001.... |
| 1 | 2 | 2 | EN | England | EN0002 Bach Camp, Herefordshire | Bach Camp | NaN | Bach Camp, Herefordshire | Herefordshire | MHE52 | 344 | SO 56 SW 3 | 110884 | 1007316 | Univallate, contour hillfort located on summit... | No | OSGB36 | 354700 | 260200 | No | NaN | Confirmed | NaN | Confirmed | NaN | SO 547602 | -296646 | 6843289 | -2.664819 | 52.238082 | Herefordshire | Herefordshire | Kimbolton | Yes | No | No | Natural and animal erosion with sheep scrapes.... | No | No | No | Yes | No | Yes | No | No | No | No | No | No | Potatoes once grown on the site, but vegetatio... | Yes | No | No | No | No | No | No | Univallate, contour hillfort located on summit... | Yes | No | No | No | No | No | No | No | No | Yes | NaN | Hill top spur. | No | No | No | No | No | No | No | No | Yes | 150.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | No | No | No | No | No | No | No | Yes | D - None | None | No | NaN | No | NaN | On 1st Ed. OS map (1888). Herefordshire Aerial... | None | Yes | No | No | No | No | No | No | Stream 0.1km located outside hillfort | Yes | No | No | No | No | No | No | No | NaN | Yes | No | No | No | No | No | No | No | No | NaN | Yes | No | No | No | No | No | No | No | NaN | Yes | No | No | No | No | No | No | No | No | NaN | Yes | No | No | No | No | No | No | No | NaN | 3.0 | N entrance damaged by wagon access and possibl... | 2.0 | S entrance original, that on the NW possibly ... | No | No | NaN | Entrances difficult to unravel. The S entrance... | Defined differentially by single rampart to 5.... | 4.1 | NaN | NaN | NaN | 4.1 | NaN | No | NaN | Yes | The ramparts are irregular which makes assessm... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | No | Yes | No | No | No | No | No | No | No | No | No | No | No | No | Yes | No | No | No | No | No | No | No | Yes | Yes | Yes | No | Bank possibly earthen. Counterscarp bank compl... | No | No | No | No | No | No | No | No | No | No | No | No | Yes | No | None | No | NaN | Yes | 1.0 | NaN | No | NaN | Dorling, P. and Wigley, A. 2012. Assessment of... | https://hillforts.arch.ox.ac.uk/?query=Atlas_o... | http://www.wikidata.org/entity/Q31113996 | NaN | NaN | NaN | NaN | NaN | 1st Identified Map Depiction (1888); Other (20... | In-turned (South); Simple Gap (North west); Ho... | http://hillforts.arch.ox.ac.uk/records/EN0002.... |
from google.colab import files
def download(data_list, filename, hf_data=hillforts_data):
if download_data == True:
name_and_number = hf_data[['Main_Atlas_Number','Main_Display_Name']].copy()
dl = name_and_number.copy()
for pkg in data_list:
if filename not in ['england', 'wales','scotland','republic-of-ireland','norhtern-ireland', 'isle-of-man', 'roi-ni', 'eng-wal-sco-iom']:
if pkg.shape[0] == hillforts_data.shape[0]:
dl = pd.merge(dl, pkg, left_index=True, right_index=True)
else:
dl = data_list[0]
dl = dl.replace('\r',' ', regex=True)
dl = dl.replace('\n',' ', regex=True)
fn = 'hillforts_primer_' + filename
fn = get_file_name(fn)
dl.to_csv(fn+'.csv', index=False)
files.download(fn+'.csv')
else:
pass
The Main Atlas Number and the Main Display Name are the primary unique reference identifiers in the data. With these, users can identify any record numerically and by name. Throughout this document, the data will be clipped into a number of sub-data packages. Where needed, these data extracts will be combined with Name and Number features to ensure the data can be understood and can, if needed, be concorded.
name_and_number_features = ['Main_Atlas_Number','Main_Display_Name']
name_and_number = hillforts_data[name_and_number_features].copy()
name_and_number.head()
| Main_Atlas_Number | Main_Display_Name | |
|---|---|---|
| 0 | 1 | Aconbury Camp, Herefordshire (Aconbury Beacon) |
| 1 | 2 | Bach Camp, Herefordshire |
| 2 | 3 | Backbury Camp, Herefordshire (Ethelbert's Camp) |
| 3 | 4 | Brandon Camp, Herefordshire |
| 4 | 5 | British Camp, Herefordshire (Herefordshire Bea... |
location_numeric_data_short_features = ['Location_X','Location_Y']
location_numeric_data_short = hillforts_data[location_numeric_data_short_features]
location_numeric_data_short = add_density(location_numeric_data_short)
location_numeric_data_short.head()
location_data = location_numeric_data_short.copy()
location_data.head()
| Location_X | Location_Y | Density | |
|---|---|---|---|
| 0 | -303295 | 6798973 | 1.632859e-12 |
| 1 | -296646 | 6843289 | 1.540172e-12 |
| 2 | -289837 | 6808611 | 1.547729e-12 |
| 3 | -320850 | 6862993 | 1.670548e-12 |
| 4 | -261765 | 6810587 | 1.369981e-12 |
cluster_data = hillforts_data[['Location_X','Location_Y', 'Main_Country_Code']].copy()
cluster_data['Cluster'] = 'NA'
cluster_data['Cluster'].where(cluster_data['Main_Country_Code'] != 'NI', 'I', inplace=True)
cluster_data['Cluster'].where(cluster_data['Main_Country_Code'] != 'IR', 'I', inplace=True)
cluster_data['Cluster'] = np.where(
(cluster_data['Cluster'] == 'I') & (cluster_data['Location_Y'] >= 7060000) , 'North Ireland', cluster_data['Cluster']
)
north_ireland = cluster_data[cluster_data['Cluster'] == 'North Ireland'].copy()
cluster_data['Cluster'] = np.where(
(cluster_data['Cluster'] == 'I') & (cluster_data['Location_Y'] < 7060000) , 'South Ireland', cluster_data['Cluster']
)
south_ireland = cluster_data[cluster_data['Cluster'] == 'South Ireland'].copy()
cluster_data['Cluster'] = np.where(
(cluster_data['Cluster'] == 'NA') & (cluster_data['Location_Y'] < 7070000) , 'South', cluster_data['Cluster']
)
south = cluster_data[cluster_data['Cluster'] == 'South'].copy()
cluster_data['Cluster'] = np.where(
(cluster_data['Cluster'] == 'NA') & (cluster_data['Location_Y'] >= 7070000) & (cluster_data['Location_X'] >= -500000), 'Northeast', cluster_data['Cluster']
)
north_east = cluster_data[cluster_data['Cluster'] == 'Northeast'].copy()
cluster_data['Cluster'] = np.where(
(cluster_data['Cluster'] == 'NA') & (cluster_data['Location_Y'] >= 7070000) & (cluster_data['Location_X'] < -500000), 'Northwest', cluster_data['Cluster']
)
north_west = cluster_data[cluster_data['Cluster'] == 'Northwest'].copy()
temp_cluster_location_packages = [north_ireland, south_ireland, south, north_east, north_west]
cluster_packages = []
for pkg in temp_cluster_location_packages:
pkg = pkg.drop(['Main_Country_Code'], axis=1)
cluster_packages.append(pkg)
north_ireland, south_ireland, south, north_east, north_west = cluster_packages[0], cluster_packages[1], cluster_packages[2], cluster_packages[3], cluster_packages[4]
The boundary data contains eight features.
boundary_features = [
'Boundary_Boundary_Type',
'Boundary_Boundary_Comments',
'Boundary_Country_Code_2',
'Boundary_HER_2',
'Boundary_HER_PRN_2',
'Boundary_Current_County_2',
'Boundary_Historic_County_2',
'Boundary_Current_Parish_2']
boundary_data = hillforts_data[boundary_features].copy()
boundary_data.head()
| Boundary_Boundary_Type | Boundary_Boundary_Comments | Boundary_Country_Code_2 | Boundary_HER_2 | Boundary_HER_PRN_2 | Boundary_Current_County_2 | Boundary_Historic_County_2 | Boundary_Current_Parish_2 | |
|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | County | NaN | NaN | Worcestershire | WSM00932 | Worcestershire | Worcestershire | Eastnor (Herefordshire); Little Malvern (Worce... |
The boundary data is partial. Five of the features contain 20 records or less. There is so little data in these features that their distributions are not useful. These five features will be dropped from the reprocessed download.
boundary_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4147 entries, 0 to 4146 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Boundary_Boundary_Type 428 non-null object 1 Boundary_Boundary_Comments 259 non-null object 2 Boundary_Country_Code_2 7 non-null object 3 Boundary_HER_2 12 non-null object 4 Boundary_HER_PRN_2 10 non-null object 5 Boundary_Current_County_2 20 non-null object 6 Boundary_Historic_County_2 17 non-null object 7 Boundary_Current_Parish_2 360 non-null object dtypes: object(8) memory usage: 259.3+ KB
There is no Boundary numeric data.
boundary_numeric_data = pd.DataFrame()
There is a single boundary text feature.
boundary_text_features = [
'Boundary_Boundary_Comments']
boundary_text_data = boundary_data[boundary_text_features].copy()
boundary_text_data[boundary_text_data['Boundary_Boundary_Comments'].notna()].head()
| Boundary_Boundary_Comments | |
|---|---|
| 9 | Part of the site is located in Shropshire and ... |
| 40 | Part in Shropshire and part in Wales (Powys). |
| 70 | Part of site in Wales (Powys), part in England... |
| 95 | Formerly bisected by the historic counties of ... |
| 96 | Although situated entirely in Thatcham it lies... |
Test for 'NA'.
test_cat_list_for_NA(boundary_text_data, boundary_text_features)
Boundary_Boundary_Comments 0
Fill null values with 'NA'.
boundary_text_data = update_cat_list_for_NA(boundary_text_data, boundary_text_features)
boundary_text_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4147 entries, 0 to 4146 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Boundary_Boundary_Comments 4147 non-null object dtypes: object(1) memory usage: 32.5+ KB
There are seven Boundary encodable features. As mentioned above, five of these features contain so little data they will be dropped from the download. Before being dropped, the data will be plotted.
boundary_encodeable_features = [
'Boundary_Boundary_Type',
'Boundary_Country_Code_2',
'Boundary_HER_2',
'Boundary_HER_PRN_2',
'Boundary_Current_County_2',
'Boundary_Historic_County_2',
'Boundary_Current_Parish_2']
boundary_encodeable_data = boundary_data[boundary_encodeable_features].copy()
boundary_encodeable_data.head()
| Boundary_Boundary_Type | Boundary_Country_Code_2 | Boundary_HER_2 | Boundary_HER_PRN_2 | Boundary_Current_County_2 | Boundary_Historic_County_2 | Boundary_Current_Parish_2 | |
|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | County | NaN | Worcestershire | WSM00932 | Worcestershire | Worcestershire | Eastnor (Herefordshire); Little Malvern (Worce... |
boundary_encodeable_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4147 entries, 0 to 4146 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Boundary_Boundary_Type 428 non-null object 1 Boundary_Country_Code_2 7 non-null object 2 Boundary_HER_2 12 non-null object 3 Boundary_HER_PRN_2 10 non-null object 4 Boundary_Current_County_2 20 non-null object 5 Boundary_Historic_County_2 17 non-null object 6 Boundary_Current_Parish_2 360 non-null object dtypes: object(7) memory usage: 226.9+ KB
All features in this dataset contain null values. 'NA' is not currently present in any feature and will be used to replace null values.
test_cat_list_for_NA(boundary_encodeable_data, boundary_encodeable_features)
Boundary_Boundary_Type 0 Boundary_Country_Code_2 0 Boundary_HER_2 0 Boundary_HER_PRN_2 0 Boundary_Current_County_2 0 Boundary_Historic_County_2 0 Boundary_Current_Parish_2 0
Null values updated to 'NA'.
boundary_encodeable_data = update_cat_list_for_NA(boundary_encodeable_data, boundary_encodeable_features)
boundary_encodeable_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4147 entries, 0 to 4146 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Boundary_Boundary_Type 4147 non-null object 1 Boundary_Country_Code_2 4147 non-null object 2 Boundary_HER_2 4147 non-null object 3 Boundary_HER_PRN_2 4147 non-null object 4 Boundary_Current_County_2 4147 non-null object 5 Boundary_Historic_County_2 4147 non-null object 6 Boundary_Current_Parish_2 4147 non-null object dtypes: object(7) memory usage: 226.9+ KB
The majority of hillforts (3719/89.68%) have no Boundary Type information.
boundary_encodeable_data['Boundary_Boundary_Type'].value_counts()
NA 3719 Parish/Townland 391 County 22 Other 8 National 7 Name: Boundary_Boundary_Type, dtype: int64
x_data = []
for bdry in list(pd.unique(boundary_encodeable_data['Boundary_Boundary_Type'])):
x_data.append(bdry)
y_data = []
for entry in x_data:
count = len(boundary_encodeable_data[boundary_encodeable_data['Boundary_Boundary_Type'] == entry])
y_data.append(count)
plot_bar_chart_using_two_tables(x_data, y_data, 'Boundary Type', 'Count', 'Boundary_Boundary_Type')
Saving figure hillforts_primer_part03-001.png
<ipython-input-53-fcc09a6f289d>:13: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect. plt.tight_layout()
Where a Boundary Type has been recorded, Parish/Townland is the most common (391). There are a small number of hillforts on a county boundary (22) and an even smaller number on a national border (7).
x_data_short = x_data[1:]
y_data_short = y_data[1:]
plot_bar_chart_using_two_tables(x_data_short, y_data_short, 'Boundary Type', 'Count', 'Boundary_Type_Excluding_Null_Values')
Saving figure hillforts_primer_part03-002.png
<ipython-input-53-fcc09a6f289d>:13: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect. plt.tight_layout()
location_boundary_data = pd.merge(location_numeric_data_short, boundary_encodeable_data, left_index=True, right_index=True)
3719 hillforts (89.68%) have no associated boundary information.
plot_over_grey_boundary(location_boundary_data, 'Boundary_Boundary_Type', 'NA')
Saving figure hillforts_primer_part03-003.png
89.68%
Only 22 hillforts have a relationship to county boundaries.
plot_over_grey_boundary(location_boundary_data, 'Boundary_Boundary_Type', 'County')
Saving figure hillforts_primer_part03-004.png
0.53%
Only seven hillforts have been recorded as built, on what is now, a national boundary.
plot_over_grey_boundary(location_boundary_data, 'Boundary_Boundary_Type', 'National')
Saving figure hillforts_primer_part03-005.png
0.17%
This distribution contains a survey bias. The data for England and the Republic of Ireland looks to have some coherence but the data for Scotland, Wales and Northern Ireland is patchy. There also seems to be recording bias, caused by localised intense recording, around Berwick and various locations across the south of England. See: Boundary Current Parish 2 Mapped.
plot_over_grey_boundary(location_boundary_data, 'Boundary_Boundary_Type', 'Parish/Townland')
Saving figure hillforts_primer_part03-006.png
9.43%
Eight hillforts coincide with boundaries classified as other.
plot_over_grey_boundary(location_boundary_data, 'Boundary_Boundary_Type', 'Other')
Saving figure hillforts_primer_part03-007.png
0.19%
Seven hillforts have a Boundary Code 2. There are three codes used which relate to England (EN), Scotland (SC) and Wales (WA).
boundary_encodeable_data['Boundary_Country_Code_2'].value_counts()
NA 4140 EN 4 WA 2 SC 1 Name: Boundary_Country_Code_2, dtype: int64
There are 12 hillforts with a Boundary HER 2. There are nine values, of which most have a single entry. All are HER service names.
boundary_encodeable_data['Boundary_HER_2'].value_counts()
NA 4135 Worcestershire 2 Shropshire 2 Clwyd Powys 2 West Berkshire 1 Dudley 1 Hampshire 1 Fife Council 1 East Sussex 1 Wiltshire and Swindon 1 Name: Boundary_HER_2, dtype: int64
There are ten hillforts with a HER PRN 2 value. There are nine unique values, of which most have a single entry. All are HER PRN ID numbers.
boundary_encodeable_data['Boundary_HER_PRN_2'].value_counts()
NA 4137 WSM00932 2 MSA868 1 19259 1 MSA828 1 MWB3075 1 7097 1 3252 1 28636 1 MWI17466 1 Name: Boundary_HER_PRN_2, dtype: int64
There are 20 hillforts with a Boundary Current County 2. There are 15 unique values, of which most have a single entry. All are county names.
boundary_encodeable_data['Boundary_Current_County_2'].value_counts()
NA 4127 Worcestershire 3 Shropshire 2 Powys 2 Wiltshire 2 West Berkshire 1 West Midlands 1 Somerset 1 Flintshire 1 Denbighshire 1 Hampshire 1 Fife 1 Hertfordshire 1 Windsor and Maidenhead 1 Cornwall 1 East Sussex 1 Name: Boundary_Current_County_2, dtype: int64
There are 17 hillforts with a Boundary Historic County 2. There are 13 unique values, of which most have a single entry. All are historic county names.
boundary_encodeable_data['Boundary_Historic_County_2'].value_counts()
NA 4130 Worcestershire 3 Shropshire 2 Denbighshire 2 Montgomeryshire 1 Hampshire 1 Berkshire 1 Mointgomeryshire 1 Staffordshire 1 Somerset 1 Fife 1 Selkirkshire 1 Sussex 1 Wiltshire 1 Name: Boundary_Historic_County_2, dtype: int64
There are 14 hillforts with a Boundary Current Parish 2. There are four unique values.
boundary_encodeable_data['Boundary_Current_Parish_2'].value_counts()[:5]
NA 3787 Branxton 4 Chatton 4 Doddington 3 Lowick 3 Name: Boundary_Current_Parish_2, dtype: int64
This feature is very similar to, and suffers the same survey bias as, that detailed in Parish / Townland Boundary Mapped.
temp_boundary_data_plus = boundary_encodeable_data.copy()
temp_boundary_data_plus.where(temp_boundary_data_plus['Boundary_Current_Parish_2'] == 'NA', 'Yes', inplace=True)
temp_boundary_data_plus.head()
| Boundary_Boundary_Type | Boundary_Country_Code_2 | Boundary_HER_2 | Boundary_HER_PRN_2 | Boundary_Current_County_2 | Boundary_Historic_County_2 | Boundary_Current_Parish_2 | |
|---|---|---|---|---|---|---|---|
| 0 | NA | NA | NA | NA | NA | NA | NA |
| 1 | NA | NA | NA | NA | NA | NA | NA |
| 2 | NA | NA | NA | NA | NA | NA | NA |
| 3 | NA | NA | NA | NA | NA | NA | NA |
| 4 | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
temp_location_boundary_data = pd.merge(location_numeric_data_short, temp_boundary_data_plus, left_index=True, right_index=True)
Boundary_Current_Parish_2_stats = plot_over_grey(temp_location_boundary_data, 'Boundary_Current_Parish_2', 'Yes')
Saving figure hillforts_primer_part03-008.png
8.68%
review_data_split(boundary_data, boundary_numeric_data, boundary_text_data, boundary_encodeable_data)
Data split good.
Only Boundary Type and Current Parish 2 will be retained in the download data package. The remaining boundary features do not contain sufficient data and would likely be misleading, if used in a machine leaning model.
boundary_encodable_features_short = [
'Boundary_Boundary_Type',
'Boundary_Current_Parish_2']
boundary_encodable_short = boundary_encodeable_data[boundary_encodable_features_short].copy()
boundary_encodable_short.head()
| Boundary_Boundary_Type | Boundary_Current_Parish_2 | |
|---|---|---|
| 0 | NA | NA |
| 1 | NA | NA |
| 2 | NA | NA |
| 3 | NA | NA |
| 4 | County | Eastnor (Herefordshire); Little Malvern (Worce... |
boundary_data_list = [boundary_numeric_data, boundary_text_data, boundary_encodable_short]
If you do not wish to download the data using this document, all the processed data packages, notebooks and images are available here:
https://github.com/MikeDairsie/Hillforts-Primer.
download(boundary_data_list, 'Boundary_package')
There are 15 Dating features. The first eight record period date ranges. There are two features recording dating evidence of activity prior to construction and two features recording activity post abandon. The remaining thee features record dating reliability, related dating evidence and general dating comments.
dating_features = [
'Dating_Date_Pre_1200BC',
'Dating_Date_1200BC_800BC',
'Dating_Date_800BC_400BC',
'Dating_Date_400BC_AD50',
'Dating_Date_AD50_AD400',
'Dating_Date_AD400_AD800',
'Dating_Date_Post_AD800',
'Dating_Date_Unknown',
'Dating_Date_Reliability',
'Dating_Date_Comments',
'Dating_Pre',
'Dating_Pre_Comments',
'Dating_Post',
'Dating_Post_Comments',
'Related_Dating_Evidence']
dating_data = hillforts_data[dating_features].copy()
dating_data.head()
| Dating_Date_Pre_1200BC | Dating_Date_1200BC_800BC | Dating_Date_800BC_400BC | Dating_Date_400BC_AD50 | Dating_Date_AD50_AD400 | Dating_Date_AD400_AD800 | Dating_Date_Post_AD800 | Dating_Date_Unknown | Dating_Date_Reliability | Dating_Date_Comments | Dating_Pre | Dating_Pre_Comments | Dating_Post | Dating_Post_Comments | Related_Dating_Evidence | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | No | No | Yes | Yes | Yes | No | No | No | C - Low | The finding of Iron Age and Roman pottery sugg... | No | NaN | Yes | Evidence of Civil War occupation and possible ... | Artefactual |
| 1 | No | No | No | No | No | No | No | Yes | D - None | None | No | NaN | No | NaN | NaN |
| 2 | No | No | No | No | No | No | No | Yes | D - None | The chance finding of a number of cloudy blue ... | No | None | No | None | NaN |
| 3 | No | No | No | No | Yes | Yes | No | No | B - Medium | Iron Age to Roman and possible later enclosure... | Yes | Possible Bronze Age ring ditch could indicate ... | Yes | Possible later Roman or post-Roman enclosure. | Artefactual |
| 4 | No | No | Yes | Yes | Yes | No | Yes | No | C - Low | The earlier enclosure of Phase I could be late... | No | There is no evidence of pre-hillfort activity,... | Yes | The ringwork is thought to be of medieval date. | Morphology/Earthwork/Typology |
All the period features and the 'pre' and 'post' features contain yes/no responses. These contain no null entries. All the remaining features contain empty records. Reliability and Related dating evidence contain controlled vocabularies (data derived from a pick list).
dating_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4147 entries, 0 to 4146 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Dating_Date_Pre_1200BC 4147 non-null object 1 Dating_Date_1200BC_800BC 4147 non-null object 2 Dating_Date_800BC_400BC 4147 non-null object 3 Dating_Date_400BC_AD50 4147 non-null object 4 Dating_Date_AD50_AD400 4147 non-null object 5 Dating_Date_AD400_AD800 4147 non-null object 6 Dating_Date_Post_AD800 4147 non-null object 7 Dating_Date_Unknown 4147 non-null object 8 Dating_Date_Reliability 4134 non-null object 9 Dating_Date_Comments 4116 non-null object 10 Dating_Pre 4147 non-null object 11 Dating_Pre_Comments 448 non-null object 12 Dating_Post 4147 non-null object 13 Dating_Post_Comments 1961 non-null object 14 Related_Dating_Evidence 805 non-null object dtypes: object(15) memory usage: 486.1+ KB
There is no numeric Dating Data.
dating_numeric_data = pd.DataFrame()
There are three Dating text features.
dating_text_features = [
'Dating_Date_Comments',
'Dating_Pre_Comments',
'Dating_Post_Comments']
dating_text_data = dating_data[dating_text_features].copy()
dating_text_data.head()
| Dating_Date_Comments | Dating_Pre_Comments | Dating_Post_Comments | |
|---|---|---|---|
| 0 | The finding of Iron Age and Roman pottery sugg... | NaN | Evidence of Civil War occupation and possible ... |
| 1 | None | NaN | NaN |
| 2 | The chance finding of a number of cloudy blue ... | None | None |
| 3 | Iron Age to Roman and possible later enclosure... | Possible Bronze Age ring ditch could indicate ... | Possible later Roman or post-Roman enclosure. |
| 4 | The earlier enclosure of Phase I could be late... | There is no evidence of pre-hillfort activity,... | The ringwork is thought to be of medieval date. |
Test for 'NA'.
test_cat_list_for_NA(dating_text_data, dating_text_features)
Dating_Date_Comments 0 Dating_Pre_Comments 0 Dating_Post_Comments 0
Fill null values with 'NA'.
dating_text_data = update_cat_list_for_NA(dating_text_data, dating_text_features)
dating_text_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4147 entries, 0 to 4146 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Dating_Date_Comments 4147 non-null object 1 Dating_Pre_Comments 4147 non-null object 2 Dating_Post_Comments 4147 non-null object dtypes: object(3) memory usage: 97.3+ KB
dating_encodeable_features = [
'Dating_Date_Pre_1200BC',
'Dating_Date_1200BC_800BC',
'Dating_Date_800BC_400BC',
'Dating_Date_400BC_AD50',
'Dating_Date_AD50_AD400',
'Dating_Date_AD400_AD800',
'Dating_Date_Post_AD800',
'Dating_Date_Unknown',
'Dating_Date_Reliability',
'Dating_Pre',
'Dating_Post',
'Related_Dating_Evidence']
dating_encodeable_data = dating_data[dating_encodeable_features].copy()
dating_encodeable_data.head()
| Dating_Date_Pre_1200BC | Dating_Date_1200BC_800BC | Dating_Date_800BC_400BC | Dating_Date_400BC_AD50 | Dating_Date_AD50_AD400 | Dating_Date_AD400_AD800 | Dating_Date_Post_AD800 | Dating_Date_Unknown | Dating_Date_Reliability | Dating_Pre | Dating_Post | Related_Dating_Evidence | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | No | No | Yes | Yes | Yes | No | No | No | C - Low | No | Yes | Artefactual |
| 1 | No | No | No | No | No | No | No | Yes | D - None | No | No | NaN |
| 2 | No | No | No | No | No | No | No | Yes | D - None | No | No | NaN |
| 3 | No | No | No | No | Yes | Yes | No | No | B - Medium | Yes | Yes | Artefactual |
| 4 | No | No | Yes | Yes | Yes | No | Yes | No | C - Low | No | Yes | Morphology/Earthwork/Typology |
review_data_split(dating_data, dating_numeric_data, dating_text_data, dating_encodeable_data)
Data split good.
The majority of hillforts have no date infromation. The maps below show the data contains a recording bias in that the majority of dating information comes from southern England. There is very little dating information outside this area.
date_features = [
'Dating_Date_Pre_1200BC',
'Dating_Date_1200BC_800BC',
'Dating_Date_800BC_400BC',
'Dating_Date_400BC_AD50',
'Dating_Date_AD50_AD400',
'Dating_Date_AD400_AD800',
'Dating_Date_Post_AD800',
'Dating_Date_Unknown']
date_data = dating_encodeable_data[date_features]
date_data.head()
| Dating_Date_Pre_1200BC | Dating_Date_1200BC_800BC | Dating_Date_800BC_400BC | Dating_Date_400BC_AD50 | Dating_Date_AD50_AD400 | Dating_Date_AD400_AD800 | Dating_Date_Post_AD800 | Dating_Date_Unknown | |
|---|---|---|---|---|---|---|---|---|
| 0 | No | No | Yes | Yes | Yes | No | No | No |
| 1 | No | No | No | No | No | No | No | Yes |
| 2 | No | No | No | No | No | No | No | Yes |
| 3 | No | No | No | No | Yes | Yes | No | No |
| 4 | No | No | Yes | Yes | Yes | No | Yes | No |
The majority of hillforts are undated.
plot_bar_chart(date_data, 2, 'Dating', 'Count', 'Dating')
Saving figure hillforts_primer_part03-009.png
<ipython-input-53-fcc09a6f289d>:13: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect. plt.tight_layout()
For the relativly few hillforts that have dating evidence, the majority are dated to the Iron Age (800BC to AD50) with the largest cluster being the late Iron Age (400BC to AD50). There are relativly few pre Iron Age dates. In contrast, there are a number early medieval dates (AD50 to AD800) with most falling at the lower end between AD50 and AD400.
date_features_minus = [
'Dating_Date_Pre_1200BC',
'Dating_Date_1200BC_800BC',
'Dating_Date_800BC_400BC',
'Dating_Date_400BC_AD50',
'Dating_Date_AD50_AD400',
'Dating_Date_AD400_AD800',
'Dating_Date_Post_AD800']
date_data_minus = hillforts_data[date_features_minus]
date_data_minus.head()
| Dating_Date_Pre_1200BC | Dating_Date_1200BC_800BC | Dating_Date_800BC_400BC | Dating_Date_400BC_AD50 | Dating_Date_AD50_AD400 | Dating_Date_AD400_AD800 | Dating_Date_Post_AD800 | |
|---|---|---|---|---|---|---|---|
| 0 | No | No | Yes | Yes | Yes | No | No |
| 1 | No | No | No | No | No | No | No |
| 2 | No | No | No | No | No | No | No |
| 3 | No | No | No | No | Yes | Yes | No |
| 4 | No | No | Yes | Yes | Yes | No | Yes |
plot_bar_chart(date_data_minus, 2, 'Dating', 'Count', 'Dating_Dated_only')
Saving figure hillforts_primer_part03-010.png
<ipython-input-53-fcc09a6f289d>:13: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect. plt.tight_layout()
location_date_data = pd.merge(location_numeric_data_short, date_data, left_index=True, right_index=True)
There are only 12 hillforts which have produced dates pre 1200BC. A density plot has not been produced as there is insufficient data.
dt_1200 = plot_over_grey(location_date_data, 'Dating_Date_Pre_1200BC', 'Yes')
Saving figure hillforts_primer_part03-011.png
0.29%
Only 67 hillforts have a date between 1200 BC and 800 BC. A density plot has not been produced as there is insufficient data.
dt_1200_800 = plot_over_grey(location_date_data, 'Dating_Date_1200BC_800BC', 'Yes')
Saving figure hillforts_primer_part03-012.png
1.62%
Hillfots dated 800 BC to 400 BC are located predominatly in south central England. There is a clear bias in this distribution with there being only a single fort in Ireland and very few in Scoland, Wales and south-west England.
dt_800_400 = plot_over_grey(location_date_data, 'Dating_Date_800BC_400BC', 'Yes')
Saving figure hillforts_primer_part03-013.png
6.99%
Although the density for this distribtion has been produced, it looks to be a highly missleading custer due to a recording bias.
plot_density_over_grey(dt_800_400, 'Dating_Date_800BC_400BC')
Saving figure hillforts_primer_part03-014.png
Plotting the orbits of Oxford Universiy and the head office of Historic England, there is an almost exact corrilation between the overlaps of these two orbits and the most dense concentration of records in this cluster. This suggests the distribution is more likely to be the result of concentrated sampling in this area rather than being a meaningful distribution relating to this period. What the cluster does show is that, in this concentrated area, many hillforts have dates in this range.
date_800bc_400bc = plot_over_grey(location_date_data, 'Dating_Date_800BC_400BC', 'Yes','', False, False, True, True)
Saving figure hillforts_primer_part03-015.png
6.99%
The reccording bias seen in 800 BC to 400 BC is again visable in this cluster and similar voids in the record can be seen acroos the rest of the atlas.
dt_400_50 = plot_over_grey(location_date_data, 'Dating_Date_400BC_AD50', 'Yes')
Saving figure hillforts_primer_part03-016.png
10.78%
The cluster focus is identical to that seen in 800 BC to 400 BC.
plot_density_over_grey(dt_400_50, 'Dating_Date_400BC_AD50')
Saving figure hillforts_primer_part03-017.png
The cluster focus is again a result of intensive recording within the orbits of Oxford Universit and Historic England. Within this area it can be said that for forts with dates, hillforts with a 400 BC to AD 50 date are the most common.
date_400bc_50ad = plot_over_grey(location_date_data, 'Dating_Date_400BC_AD50', 'Yes', '', False, False, True, True)
Saving figure hillforts_primer_part03-018.png
10.78%
The distribution of forts in the AD 50 to AD 400 range will also be bieased decause of the recording focus mentioned above. It is noteable that there are no records recorded for this period in Ireland. Because of the bias in the dataing records mentioned in 400BC - AD50 Mapped Plus Oxford and Swindon Orbits, a distribution plot for this period has not been produced.
dt_50_400 = plot_over_grey(location_date_data, 'Dating_Date_AD50_AD400', 'Yes')
Saving figure hillforts_primer_part03-019.png
4.61%
A small number of hillforts have dates in the AD 400 to AD 800 rage. Interestingly most of these are outside the bias seen in the dating focus mentioned in 400BC - AD50 Mapped Plus Oxford and Swindon Orbits. This may be an observation that is meaningful for the south of England.
dt_400_800 = plot_over_grey(location_date_data, 'Dating_Date_AD400_AD800', 'Yes')
Saving figure hillforts_primer_part03-020.png
2.07%
A small number of hillforts have dates post AD 800. Again, most are outside the bias seen in the dating focus mentioned in 400BC - AD50 Mapped Plus Oxford and Swindon Orbits. This may be an observation that is meaningful for the south of England.
dt_800 = plot_over_grey(location_date_data, 'Dating_Date_Post_AD800', 'Yes')
Saving figure hillforts_primer_part03-021.png
1.25%
Only 14.5% of hillofrts have dating information and the majority of these are in the south of England.
dt_known = plot_over_grey(location_date_data, 'Dating_Date_Unknown', 'No', 'No (Known)', False, False, True, True)
Saving figure hillforts_primer_part03-022.png
14.49%
Most (85.5%) of hillforts have no dating evidence.
dt_unknown = plot_over_grey(location_date_data, 'Dating_Date_Unknown', 'Yes')
Saving figure hillforts_primer_part03-023.png
85.51%
There are two features containing null values
dating_encodeable_data[['Dating_Date_Reliability','Related_Dating_Evidence']].info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4147 entries, 0 to 4146 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Dating_Date_Reliability 4134 non-null object 1 Related_Dating_Evidence 805 non-null object dtypes: object(2) memory usage: 64.9+ KB
Test for 'NA'.
test_cat_list_for_NA(dating_encodeable_data, ['Dating_Date_Reliability','Related_Dating_Evidence'])
Dating_Date_Reliability 0 Related_Dating_Evidence 0
Update null values to 'NA'.
dating_encodeable_data = update_cat_list_for_NA(dating_encodeable_data, ['Dating_Date_Reliability','Related_Dating_Evidence'])
dating_encodeable_data_review = test_numeric(dating_encodeable_data[['Dating_Date_Reliability','Related_Dating_Evidence']])
dating_encodeable_data_review
| Feature | Entries | Numeric | Non-Numeric | Null | |
|---|---|---|---|---|---|
| 0 | Dating_Date_Reliability | 4147 | 0 | 4147 | 0 |
| 1 | Related_Dating_Evidence | 4147 | 0 | 4147 | 0 |
Data reliability contains five values.
pd.unique(dating_encodeable_data['Dating_Date_Reliability'])
array(['C - Low', 'D - None', 'B - Medium', 'A - High', 'NA'],
dtype=object)
The majority of hillforts have no dataing reliability recorded as most hillforts do not have dating evidence. Of those that do have dating evidence, only 62 have dates that are classified as highly reliable.
plot_bar_chart_value_counts(dating_encodeable_data['Dating_Date_Reliability'], 'Reliability', 'Count', 'Date_Reliability')
Saving figure hillforts_primer_part03-024.png
<ipython-input-53-fcc09a6f289d>:13: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect. plt.tight_layout()
Most hillforts have no reliable dating evidence.
location_dating_encodeable_data = pd.merge(location_numeric_data_short, dating_encodeable_data, left_index=True, right_index=True)
dating_reliability_none = location_dating_encodeable_data[location_dating_encodeable_data['Dating_Date_Reliability']=='D - None'].copy()
plot_over_grey_numeric(dating_reliability_none, 'Dating_Date_Reliability', 'Dating_Date_Reliability_D-None')
Saving figure hillforts_primer_part03-025.png
Of the hillforts that have dataing evidence, only a relativly few have dating that is classified as low reliability. It should be noted that the distibutions based on dating reliability suffer from the same recording bias discussed in in 400BC - AD50 Mapped Plus Oxford and Swindon Orbits.
dating_reliability_low = location_dating_encodeable_data[location_dating_encodeable_data['Dating_Date_Reliability']=='C - Low'].copy()
plot_over_grey_numeric(dating_reliability_low, 'Dating_Date_Reliability', 'Dating_Date_Reliability_C-Low')
Saving figure hillforts_primer_part03-026.png
dating_reliability_med = location_dating_encodeable_data[location_dating_encodeable_data['Dating_Date_Reliability']=='B - Medium'].copy()
plot_over_grey_numeric(dating_reliability_med, 'Dating_Date_Reliability', 'Dating_Date_Reliability_B-Medium')
Saving figure hillforts_primer_part03-027.png
dating_reliability_high = location_dating_encodeable_data[location_dating_encodeable_data['Dating_Date_Reliability']=='A - High'].copy()
plot_over_grey_numeric(dating_reliability_high, 'Dating_Date_Reliability', 'Dating_Date_Reliability_A-High')
Saving figure hillforts_primer_part03-028.png
Thirteen records have no information recorded for date reliability.
dating_reliability_na = location_dating_encodeable_data[location_dating_encodeable_data['Dating_Date_Reliability']=='NA'].copy()
plot_over_grey_numeric(dating_reliability_na, 'Dating_Date_Reliability', 'Dating_Date_Reliability_NA')
Saving figure hillforts_primer_part03-029.png
When this data is encoded the current values will be added as feature/ column headings. The current format of the data, with spaces and hypen, could lead to problems so it is simplified.
dating_encodeable_data['Dating_Date_Reliability'] = np.where(dating_encodeable_data['Dating_Date_Reliability'] != 'NA', dating_encodeable_data['Dating_Date_Reliability'].astype(str).str[0], dating_encodeable_data['Dating_Date_Reliability'])
pd.unique(dating_encodeable_data['Dating_Date_Reliability'])
array(['C', 'D', 'B', 'A', 'NA'], dtype=object)
This feature records if there is activity on site prior to the construction of the hillfort.
dating_pre_stats = plot_over_grey(location_dating_encodeable_data, 'Dating_Pre', 'Yes', '')
Saving figure hillforts_primer_part03-030.png
11.84%
This feature records if there is activity on site post the abandon of the hillfort.
dating_post_stats = plot_over_grey(location_dating_encodeable_data, 'Dating_Post', 'Yes', '')
Saving figure hillforts_primer_part03-031.png
48.54%
Only the south has sufficient data to produce a meningful chart. The southern data shows peak between 400 BC and AD 50 with high concentrations of datable material on these sites running from 800 BC to AD 400. The dataing in this reagion is peodomenently derived from artifactul analysis. See: Map Related Dating Artefactual.
The Northeast and Northwest show a similar distribution of dates but differ in that they show signs of continured use beyond AD 800. Caution is needed in that this distribution of dates is based on a relativly small dataset.
The Irish data is quite differnt. There is a large peak from 1200 BC and 800 BC with the North of Ireland having a similarly large peak for dates pre 1200BC. There are almost no dates in the North of Ireland between 800 BC and 400 BC and only a small peak in South Ireland. There are very few dates in North or South Ireland between AD 50and AD 400 and then increased activity, in terms of dating, from AD 400 on to post AD 800. There are very few dates for hillforts in Ierland. It is highly likely the distridution of dates just discussed could change readically as more dates become available. Extream caution is needed when interpreting the distirbution and spread of Irish dates.
location_enclosing_data_nw_dates = pd.merge(north_west, date_data, left_index=True, right_index=True)
location_enclosing_data_ne_dates = pd.merge(north_east, date_data, left_index=True, right_index=True)
location_enclosing_data_ireland_n_dates = pd.merge(north_ireland, date_data, left_index=True, right_index=True)
location_enclosing_data_ireland_s_dates = pd.merge(south_ireland, date_data, left_index=True, right_index=True)
location_enclosing_data_south_dates = pd.merge(south, date_data, left_index=True, right_index=True)
plot_dates_by_region(location_enclosing_data_nw_dates,location_enclosing_data_ne_dates,location_enclosing_data_ireland_n_dates,location_enclosing_data_ireland_s_dates,location_enclosing_data_south_dates, date_features)
Saving figure hillforts_primer_part03-032.png
<ipython-input-53-fcc09a6f289d>:13: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect. plt.tight_layout()
list(pd.unique(dating_encodeable_data['Related_Dating_Evidence']))
['Artefactual', 'NA', 'Morphology/Earthwork/Typology', 'Artefactual; C14; Morphology/Earthwork/Typology', 'Artefactual; C14', 'C14', 'Artefactual; Morphology/Earthwork/Typology', 'Morphology/Earthwork/Typology; Other', 'C14; Morphology/Earthwork/Typology', 'Other', 'Artefactual; Other', 'Artefactual; C14; Other', 'C14; Morphology/Earthwork/Typology; Other', 'C14; Other', 'Artefactual; C14; Artefactual; C14', 'Artefactual; Morphology/Earthwork/Typology; Other']
A gazateer of five terms (classes) has been used - see below. These terms unclude 'NA' which was added by this study to replace null values. One hillfort may be associated with multiple dating classes. In the distributions that follow the same locations may occour in multiple dataing plots.
related_dating_terms = ['Artefactual', 'NA', 'Morphology/Earthwork/Typology', 'C14', 'Other']
To enable the Related Dating Evidence to be interogated more simply, five new, boolean, 'Yes/No' columns will be added to the encodeable data. Initially, they will be set to 'No' and then updated to 'Yes' if the term is found in the current 'Related_Dating_Evidence' feature. Note that the forward slash '/' will be removed from the column heading as this can cause problems.
dating_encodeable_data_plus = dating_encodeable_data.copy()
additiaonal_related_dating_features = ['Related_Dating_Artefactual', 'Related_Dating_NA', 'Related_Dating_Morph_Earth_Typo', 'Related_Dating_C14', 'Related_Dating_Other']
for feature in additiaonal_related_dating_features:
dating_encodeable_data_plus[feature] = 'No'
dating_encodeable_data_plus[additiaonal_related_dating_features].head()
| Related_Dating_Artefactual | Related_Dating_NA | Related_Dating_Morph_Earth_Typo | Related_Dating_C14 | Related_Dating_Other | |
|---|---|---|---|---|---|
| 0 | No | No | No | No | No |
| 1 | No | No | No | No | No |
| 2 | No | No | No | No | No |
| 3 | No | No | No | No | No |
| 4 | No | No | No | No | No |
Populate the 'Artifactual' column.
dating_encodeable_data_plus['Related_Dating_Artefactual'].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('Artefactual', case=False)] = 'Yes'
dating_encodeable_data_plus[['Related_Dating_Evidence','Related_Dating_Artefactual']].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('Artefactual', case=False)][5:10]
| Related_Dating_Evidence | Related_Dating_Artefactual | |
|---|---|---|
| 11 | Artefactual; C14; Morphology/Earthwork/Typology | Yes |
| 12 | Artefactual | Yes |
| 14 | Artefactual | Yes |
| 16 | Artefactual; C14 | Yes |
| 18 | Artefactual | Yes |
Populate the 'Morphology/Earthwork/Typology' column.
dating_encodeable_data_plus['Related_Dating_Morph_Earth_Typo'].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('Morphology/Earthwork/Typology', case=False)] = 'Yes'
dating_encodeable_data_plus[['Related_Dating_Evidence','Related_Dating_Morph_Earth_Typo']].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('Morphology/Earthwork/Typology', case=False)].head()
| Related_Dating_Evidence | Related_Dating_Morph_Earth_Typo | |
|---|---|---|
| 4 | Morphology/Earthwork/Typology | Yes |
| 11 | Artefactual; C14; Morphology/Earthwork/Typology | Yes |
| 39 | Artefactual; Morphology/Earthwork/Typology | Yes |
| 47 | Morphology/Earthwork/Typology; Other | Yes |
| 60 | C14; Morphology/Earthwork/Typology | Yes |
Populate the 'C14' column.
dating_encodeable_data_plus['Related_Dating_C14'].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('C14', case=False)] = 'Yes'
dating_encodeable_data_plus[['Related_Dating_Evidence','Related_Dating_C14']].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('C14', case=False)].head()
| Related_Dating_Evidence | Related_Dating_C14 | |
|---|---|---|
| 11 | Artefactual; C14; Morphology/Earthwork/Typology | Yes |
| 16 | Artefactual; C14 | Yes |
| 21 | Artefactual; C14 | Yes |
| 23 | C14 | Yes |
| 60 | C14; Morphology/Earthwork/Typology | Yes |
Populate the 'Other' column.
dating_encodeable_data_plus['Related_Dating_Other'].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('Other', case=False)] = 'Yes'
dating_encodeable_data_plus[['Related_Dating_Evidence','Related_Dating_Other']].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('Other', case=False)].head()
| Related_Dating_Evidence | Related_Dating_Other | |
|---|---|---|
| 47 | Morphology/Earthwork/Typology; Other | Yes |
| 65 | Other | Yes |
| 70 | Other | Yes |
| 75 | Other | Yes |
| 154 | Artefactual; Other | Yes |
Populate the 'NA' column.
dating_encodeable_data_plus['Related_Dating_NA'].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('NA', case=False)] = 'Yes'
dating_encodeable_data_plus[['Related_Dating_Evidence','Related_Dating_NA']].loc[dating_encodeable_data_plus['Related_Dating_Evidence'].str.contains('NA', case=False)].head()
| Related_Dating_Evidence | Related_Dating_NA | |
|---|---|---|
| 1 | NA | Yes |
| 2 | NA | Yes |
| 5 | NA | Yes |
| 8 | NA | Yes |
| 9 | NA | Yes |
Review a sampe of the new columns to confirm the features are as expected.
dating_encodeable_data_plus[['Related_Dating_Evidence']+additiaonal_related_dating_features][6:12]
| Related_Dating_Evidence | Related_Dating_Artefactual | Related_Dating_NA | Related_Dating_Morph_Earth_Typo | Related_Dating_C14 | Related_Dating_Other | |
|---|---|---|---|---|---|---|
| 6 | Artefactual | Yes | No | No | No | No |
| 7 | Artefactual | Yes | No | No | No | No |
| 8 | NA | No | Yes | No | No | No |
| 9 | NA | No | Yes | No | No | No |
| 10 | Artefactual | Yes | No | No | No | No |
| 11 | Artefactual; C14; Morphology/Earthwork/Typology | Yes | No | Yes | Yes | No |
The inforamtion in 'Related_Dating_Evidence' has now been migrated to the new features so the original feature can now be deleted.
dating_encodeable_data_plus = dating_encodeable_data_plus.drop(['Related_Dating_Evidence'], axis=1)
dating_encodeable_data_plus.head()
| Dating_Date_Pre_1200BC | Dating_Date_1200BC_800BC | Dating_Date_800BC_400BC | Dating_Date_400BC_AD50 | Dating_Date_AD50_AD400 | Dating_Date_AD400_AD800 | Dating_Date_Post_AD800 | Dating_Date_Unknown | Dating_Date_Reliability | Dating_Pre | Dating_Post | Related_Dating_Artefactual | Related_Dating_NA | Related_Dating_Morph_Earth_Typo | Related_Dating_C14 | Related_Dating_Other | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | No | No | Yes | Yes | Yes | No | No | No | C | No | Yes | Yes | No | No | No | No |
| 1 | No | No | No | No | No | No | No | Yes | D | No | No | No | Yes | No | No | No |
| 2 | No | No | No | No | No | No | No | Yes | D | No | No | No | Yes | No | No | No |
| 3 | No | No | No | No | Yes | Yes | No | No | B | Yes | Yes | Yes | No | No | No | No |
| 4 | No | No | Yes | Yes | Yes | No | Yes | No | C | No | Yes | No | No | Yes | No | No |
Although most dated hillforts are dated using artifactual evidence, only 13.12% of hillofrts have been dated in this way.
location_dating_encodeable_plus_data = pd.merge(location_numeric_data_short, dating_encodeable_data_plus, left_index=True, right_index=True)
related_dating_artefactual_stats = plot_over_grey(location_dating_encodeable_plus_data, 'Related_Dating_Artefactual', 'Yes', '')
Saving figure hillforts_primer_part03-033.png
13.12%
There is a strong concentration of artifactual dates in south-central England.
plot_density_over_grey(related_dating_artefactual_stats, 'Related_Dating_Artefactual')
Saving figure hillforts_primer_part03-034.png
When this cluster is plotted against the orbits of Oxford University and the head office of Historic England in Swindon there is a strong correlation between the two suggesting there is a sygnificant recording bias highlighted by this distribution.
date_artif = plot_over_grey(location_dating_encodeable_plus_data, 'Related_Dating_Artefactual', 'Yes', 'plus Swindon & Oxford', False, False, True, True)
Saving figure hillforts_primer_part03-035.png
13.12%
Dating by means of morphology, earthwork and typology has a similar bias toward south central England and toward the northern border of Northumberland. Only 5.59% of hillforts have been dated in this way. Noteably, none are in Irealand.
met_stats = plot_over_grey(location_dating_encodeable_plus_data, 'Related_Dating_Morph_Earth_Typo', 'Yes', '')
Saving figure hillforts_primer_part03-036.png
5.59%
plot_density_over_grey(met_stats, 'Related_Dating_Morph_Earth_Typo')
Saving figure hillforts_primer_part03-037.png
Carbon 14 dating (C14) is the most scientifically rigerours of the dating techniches recored in the atlas. Only 5.09% of hillforts have a C14 date but there does look to be a more even distribution of dates accross the area of the atlas.
c14_stats = plot_over_grey(location_dating_encodeable_plus_data, 'Related_Dating_C14', 'Yes', '')
Saving figure hillforts_primer_part03-038.png
5.09%
There is a cluster of dates in the eastern Scottish lowlands, particularly around Traprain Law in Eat Lothian and along the line of the Gask Ridge. There is a another, thin concentration, along the northern Welsh/English border, from the Shropshire Hills to the edge of Snowdonia.
plot_density_over_grey(c14_stats, 'Related_Dating_C14')
Saving figure hillforts_primer_part03-039.png
Fourty nine hillofts (1.18%) of hillforts are identified as having 'other' dating evidence. No further information is available via the online data.
dating_other_stats = plot_over_grey(location_dating_encodeable_plus_data, 'Related_Dating_Other', 'Yes', '')
Saving figure hillforts_primer_part03-040.png
1.18%
Most (80.59%) of hillforts have no dating evidence.
dating_na_stats = plot_over_grey(location_dating_encodeable_plus_data, 'Related_Dating_NA', 'Yes', '')
Saving figure hillforts_primer_part03-041.png
80.59%
dating_data_list = [dating_numeric_data, dating_text_data, dating_encodeable_data_plus]
If you do not wish to download the data using this document, all the processed data packages, notebooks and images are available here:
https://github.com/MikeDairsie/Hillforts-Primer.
download(dating_data_list, 'Dating_package')
if save_images:
path = os.path.join(IMAGES_PATH, f"fig_list_{part.lower()}.csv")
fig_list.to_csv(path, index=False)
Colab Notebook: Live code
HTML: Read only
HTML: Read only topographic